Relationship between temperature and load

Read in the TS objects.We set the index as the dates and remove the time/date column and the X column. The temperature file has 11 zones and I assume that those 11 zones are the same as the first zones in the load TS.

Because we have so many zones, we might want to simplify our data and only use the sum over all the zones. If we do not do that, the temperature vs weight plot will have different clusters and the TS plot will also have more lines. A simple mean should be ok due to them having the same units.

The figure shows some relations shipt between temperature and load but as temperature increases/decreases from the mean (around 56 °F), the energy load increases.

We make time series obect with zoo and ts. The zoo object have the time but the ts object only has the order. Load_zoo_zones includes all the zones but the load_zoo includes the sum of all of the loads at a time unit.

## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo

Here we will test 2004 and see how each zone behaves throughout the year. Because the first figure shows that there is a lot of difference in the magintude of load between zones we standardize it and plot it again. The standardized plot is much nicer.

## [1] "2004-01-01 01:00:00 CET"

## NULL
## NULL

Smoothing

#sum everything and compare years

smoothpars <- c(0,24,24*7, 24*30) #0 fyrir klukkustund, 24 fyrir dag, 24*7 fyrir vikur
years <- c("2004", "2005", "2006", "2007", "2008")

size_year = c(0,0,0,0,0)
for (i in (1:(length(years)-1))){
  year = years[i]
  temp_start <-  paste(c(year, "-01-", "01 01:00:00"), collapse = "")
  temp_end <-  paste(c(year, "-12-", "31 24:00:00"), collapse = "")
  as.POSIXct(temp_start)
  if (i == 4){yend = length(load_zoo)} else {yend <- which(as.POSIXct(temp_end) == index(load_zoo))-1} #-1 til þess að losna við firsta value á næsta ári
  ystart <- which(as.POSIXct(temp_start) == index(load_zoo))
  
  size_year[i] <- length(load_zoo[ystart:yend])
}
  




for (s in smoothpars){
  
  year_zoo <- matrix(data = NA, nrow = 5, ncol = max(size_year))
  
  for (i in (1:(length(years)-1))){
    year = years[i]
    temp_start <-  paste(c(year, "-01-", "01 01:00:00"), collapse = "")
    temp_end <-  paste(c(year, "-12-", "31 24:00:00"), collapse = "")
    as.POSIXct(temp_start)
    if (i == 4){yend = length(load_zoo)} else {yend <- (which(as.POSIXct(temp_end) == index(load_zoo))-1)} #-1 til þess að losna við firsta value á næsta ári
    ystart <- which(as.POSIXct(temp_start) == index(load_zoo))
    
    temp <-  coredata(load_zoo[ystart:yend])
    year_zoo[i,1:length(temp)] <- temp
  }
  year_zoo <- as.matrix(coredata(year_zoo))
  
  
  
  if (s > 0){
    
    ##Average accross 24 horys
    year_zoo_smooth <- matrix(data = NA, nrow = 5, ncol = ceiling(max(size_year)/s))
    idx1 <- 1
    idx2 <- 1
    while (idx2 <= dim(year_zoo)[2]-(s-1)){
      temp <- year_zoo[,(idx2:(idx2+(s-1)))]
      temp <- rowMeans(temp)
      year_zoo_smooth[,idx1] <- coredata(temp)
      idx2 <- idx2+s
      idx1 <- idx1+1
    }
    year_zoo <- year_zoo_smooth
}



p1 <- image(x = 1:dim(year_zoo)[1], 
      y = 1:dim(year_zoo)[2], 
      z = year_zoo, 
      xlab = "The first year [hours]", ylab = "Zones", main = "'Heatmap' of zones for the first year")




year_zoo <- t(year_zoo)
colnames(year_zoo) <- c("year1", "year2", "year3", "year4", "year5")
year_zoo <- as.data.frame(year_zoo)


p2 <- ggplot(data=year_zoo, aes(x=index(year_zoo))) + 
                geom_line(aes(y=year1), color = "Yellow") + 
                geom_line(aes(y=year2), color = "Red") + 
                geom_line(aes(y=year3), color = "Green") + 
                geom_line(aes(y=year4), color = "Blue") + 
                geom_line(aes(y=year5), color = "Black") +
                xlab('Time') +
                ylab('Load')

p1
print(p2)
}

## Warning: Removed 24 rows containing missing values (geom_path).

## Warning: Removed 24 rows containing missing values (geom_path).
## Warning: Removed 5673 rows containing missing values (geom_path).
## Warning: Removed 8782 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_path).

## Warning: Removed 2 rows containing missing values (geom_path).
## Warning: Removed 237 rows containing missing values (geom_path).
## Warning: Removed 366 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 35 rows containing missing values (geom_path).
## Warning: Removed 53 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 9 rows containing missing values (geom_path).
## Warning: Removed 13 rows containing missing values (geom_path).

Last graph really noisy. Will try to take an average of day and then plot again….

Time series analyis

We use the zoo library because the ts library does not support hourly collected data. First, the dates/hours are formatted and then a TS object is made out of them. This is done for both load and temperature. Then the data is plotted. The y axis og the ggplot has to be changed so that the left axis shows W and the right shows temeprature f.x.

Forecast Library

Autocorrelation

Here, we investigate how values in the time series are correlated to one another. We do this for different amouts of lags to wether the data is seasonal. In following figure we can see that at lag 1, the autocorrelation is highest (not far from 1). When lags are increased the correlation between the variables decreases untill around 22-24, where the datas correlation increases a little. In the ACF plot this can be more clearly seen.

If we go to even more lags the datas 24 hour periods can be seen better.

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
##   method             from    
##   fitted.fracdiff    fracdiff
##   residuals.fracdiff fracdiff

Partial autocorrelation

There seems to be a seasonal period of 24 hours. Also, when lags are increased there seem to be another seasonal period of 7 days. Now to partial autocorrelation.

There is a significant partial autocorrelation on lag 1, 2, 3, 13:17, 24,25, and 26. When the lag is higher, there seems to be partial autocorrelation every 24 hours and every 7 days, confirming a seasonal period of 24 hours as well as a weekly seasonal period.

First plot:

Second plot: the roughness of the plot is due to the seasonal period of 24 hours. But on this plot we can see that the load is more at the end of the week.

Third plot:

Today